On Computing Deltas of RDF Knowledge Bases with Blank Nodes

نویسندگان

  • Christina Lantzaki
  • Yannis Tzitzikas
  • Georgios Georgakopoulos
  • Angelos Bilas
چکیده

The Semantic Web (SW) is an evolving extension of the World Wide Web in which the content can be expressed not only in natural language, but also in formal languages (e.g. RDF/S) that can be read and used by software agents, permitting them to find, share and integrate information more easily. The semantically structured content is expressed using RDF triples and a set of such triples constitute an RDF Knowledge Base (KB), or equivalently an RDF Graph. The statement of Heraclitus ”Everything flows, nothing stands still” holds also in the context of the SW since everything changes (the resources themselves, the ontologies, the resource descriptions, etc). Consequently, the ability to compute the differences, hereafter Delta, that exist between two RDF KBs is very important. In particular, Deltas can be employed to (a) aid humans understand the evolution of knowledge, and (b) reduce the amount of data that need to be exchanged and managed over the network in order to build SW synchronization, versioning and replication services. The comparison problem becomes complicated, because RDF allows anonymous nodes. A anonymous node, else called blank node, is a node in an RDF graph which is not identified by a URI and is not a literal. Several RDF KBs rely heavily on blank nodes; e.g. 7.5% of Linked Data are estimated to be blank nodes, while in well known datasets (e.g. rdfabout.com) the percentage reaches 40%. From a functional perspective, blank nodes are convenient for representing complex attributes or resources whose identity is unknown but their properties are known. Considering blank nodes as "constants" unique to both graphs does not help either in detecting equivalence between graphs nor in reducing the Delta. On the contrary, matching the blank nodes of the two graphs can significantly reduce the produced Delta. This work is the first study approaching themethods ofmatching blank nodes as an optimization problem. The optimization aims at finding the mapping that yields the minimum in size Delta (i.e. with the least number of triples to delete or add to make the graphs equivalent). We prove that in the general case finding the optimal blank node mapping is NP-Hard by reducing it to the sub-graph isomorphism problem. When graphs do not contain directly connected blank nodes (i.e. no triples with more than one blank nodes exists), we show that the polynomial Hungarian algorithm can be used to find the optimal blank node mapping. For the general case we present various polynomial algorithms returning approximate solutions. One of these algorithms is a variation of the optimal Hungarian. For making the application of our method feasible also to KBs with a heavy load of blank nodes we present a signature-based mapping algorithmwith NlogN time complexity. Finally, for the proposed algorithms we report extensive comparative experimental results, over real and synthetic KBs, regarding delta reduction (and its deviation from the optimal), equivalence detection, and computational requirements. The results are very interesting; indicatively the algorithms produce a Delta of 12 to 7,000 times smaller thanwithout blank nodematching. The signaturebased algorithm yields up to 0.34 times bigger Deltas than the Hungarian, but is two orders of magnitude faster. Note that it requires less than 11 seconds tomatch 150,000 pairs of blank nodes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Blank Node Matching and RDF/S Comparison Functions

In RDF, a blank node (or anonymous resource or bnode) is a node in an RDF graph which is not identified by a URI and is not a literal. Several RDF/S Knowledge Bases (KBs) rely heavily on blank nodes as they are convenient for representing complex attributes or resources whose identity is unknown but their attributes (either literals or associations with other resources) are known. In this paper...

متن کامل

Demonstrating Blank Node Matching and RDF/S Comparison Functions

The ability to compute the differences that exist between two RDF/S Knowledge Bases (for short KBs) is important for aiding humans to understand the evolution of knowledge, and for reducing the amount of data that need to be exchanged and managed over the network in order to build SW synchronization, versioning and replication services [2, 3, 1, 8, 6]. A rather peculiar but quite flexible featu...

متن کامل

Semantics of Constraints in RDFS

We study constraints for RDF-Schema (RDFS) graphs. The syntax and semantics is defined for constraints in graphs that can contain RDFS properties and blank nodes. The proposal for constraint satisfaction closely resembles the possible world approach found in various contexts of incomplete databases and knowledge bases. Positive decidability results for checking satisfaction of RDFS constraints ...

متن کامل

Everything you always wanted to know about blank nodes

In this paper we thoroughly cover the issue of blank nodes, which have been defined in RDF as ‘existential variables’. We first introduce the theoretical precedent for existential blank nodes from first order logic and incomplete information in database theory. We then cover the different (and sometimes incompatible) treatment of blank nodes across the W3C stack of RDF-related standards. We pre...

متن کامل

REDD: An Algorithm for Redundancy Detection in RDF Models

The base of Semantic Web specifications is Resource Description Framework (RDF) as a standard for expressing metadata. RDF has a simple object model, allowing for easy design of knowledge bases. This implies that the size of knowledge bases can dramatically increase; therefore, it is necessary to take into account both scalability and space consumption when storing such bases. Some theoretical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013